feat(github): Add cursor-based pagination to integration repos endpoint#112591
feat(github): Add cursor-based pagination to integration repos endpoint#112591
Conversation
0bce46d to
41c5d02
Compare
|
|
||
| return response | ||
|
|
||
| return self.respond({"detail": "Repositories not supported"}, status=400) |
There was a problem hiding this comment.
It might be worth seeing if there's a way to work this into using our generic self.paginate interface. I don't know if it will work well for an api call like this, but could be worth a quick try to see
| per_page = max(1, min(int(request.GET["per_page"]), 100)) | ||
| except (ValueError, TypeError): | ||
| per_page = 100 | ||
| cursor = self._parse_cursor(request) |
There was a problem hiding this comment.
I feel like we should already have api methods that handle producing our cursors, even if you're unable to use the built in self.paginate method it might be worth looking into those.
There was a problem hiding this comment.
Switched to self.get_per_page() and self.get_cursor_from_request() in the latest push bc5a262.
The full self.paginate() flow doesn't seem to entirely work here since it expects a Django Paginator with a queryset, and we're paginating over an external API with a custom response shape ({"repos": [...], "searchable": bool}).
So the CursorResult construction for Link headers is still manual, but the input parsing now uses the standard helpers.
| This is acceptable for infinite-scroll consumers. | ||
| """ | ||
| client = self.get_client() | ||
| all_repos = client.get_repos_cached() |
There was a problem hiding this comment.
If we're just fetching all of these anyway, is there all that much value in paginating on the backend? I know we save on serialization + network cost, but we're still pulling all of these items into memory
There was a problem hiding this comment.
Good call, forgot that /installation/repositories already supports page + per_page params and returns total_count (our existing interactions with that endpoint just aggregate all pages).
Going to rework this to just make a single GH API call per page instead of caching the full list and slicing.
There was a problem hiding this comment.
starting to think this pagination is not worth the trouble, the filtering we are applying post page fetch (installable_only, not "archived") is a problem.
only thing I can think that would solve that would be to have get_repositories_paginated fetch pages and filter until it assembles a full page of filtered results.
| implement pagination (callers should fall back to | ||
| ``get_repositories()``). | ||
| """ | ||
| return None |
There was a problem hiding this comment.
Probably raise NotImplementedError here?
There was a problem hiding this comment.
Good point. The call site is already in a try/except anyway, so NotImplementedError would just be another clause. Will switch to that.
Add a new /repos-paginated/ endpoint that paginates over the cached accessible repo list for GitHub. Other providers fall back to returning the full list from get_repositories(). This is purpose-built for the SCM onboarding repo selector dropdown, where we need fast initial page loads via cursor-based pagination. Search is intentionally excluded -- the FE uses the existing /repos/ endpoint with accessibleOnly for that. The pagination interface is provider-agnostic: - BaseRepositoryIntegration.get_repositories_paginated() returns None by default (unsupported) - GitHubIntegration overrides it to slice the Django-cached repo list - The endpoint dispatches based on the return value Refs VDY-46
This reverts commit fe092f3.
Add opt-in pagination to the existing /repos/ endpoint, triggered when a caller sends the per_page query param without a search query. Existing consumers that don't send per_page continue to receive the full repo list unchanged. The pagination interface is provider-agnostic: - BaseRepositoryIntegration.get_repositories_paginated() returns None by default (provider does not support pagination) - GitHubIntegration overrides it to slice the Django-cached repo list - The endpoint dispatches based on the return value, falling back to get_repositories() when pagination is not supported This supports the SCM onboarding repo selector, which needs fast page-at-a-time loading for orgs with many GitHub repos. Refs VDY-46
Promote the local to_repo_info closure to a method on GitHubIntegration so both get_repositories and get_repositories_paginated share the same formatting logic. Refs VDY-46
Clamp per_page to at least 1 to prevent infinite empty pages (per_page=0) or incorrect has_next (per_page=-1). Add docstring to get_repositories_paginated noting it always serves from the cached accessible-repos list. Refs VDY-46
Wrap get_repositories_paginated in try/except for IntegrationError and IdentityNotValid so token revocations return 400 instead of 500. Validate per_page as an integer (non-numeric input like ?per_page=abc defaults to 100 instead of raising ValueError). Clamp cursor offset to >= 0 so crafted cursors like 0:-5:0 cannot produce unexpected slicing behavior. Refs VDY-46
Add tests for input validation edge cases: per_page=0, per_page=-1, per_page=abc, per_page=200, negative cursor offset, and IntegrationError in the paginated path. Document that accessibleOnly is ignored in the paginated path and that CursorResult is only used for Link header generation. Remove redundant return_value=[] from test patch decorators. Refs VDY-46
Document cache consistency caveat in get_repositories_paginated docstring: cache expiry between pages can cause duplicates/skips, acceptable for infinite-scroll. Make _parse_cursor a staticmethod. Add comment clarifying per_page and offset are guaranteed set when paginated is not None. Refs VDY-46
Fix references to the old get_accessible_repos_cached name in get_repositories_paginated and pagination tests after rebase.
…icing Paginated repo browsing now makes a single GitHub API call per page via GET /installation/repositories?page=N&per_page=M instead of fetching all repos into a cache and slicing. This avoids loading the full repo list into memory on every request. Also switches to the built-in get_per_page() and get_cursor_from_request() helpers, raises NotImplementedError in the base class instead of returning None, and removes the custom _parse_cursor method. Refs VDY-46
Document that archived repos and installableOnly filtering can cause pages to have fewer items than per_page and has_next to overestimate. Add test for NotImplementedError fallback when a provider does not support paginated browsing. Refs VDY-46
e5fc935 to
000570b
Compare
Backend Test FailuresFailures on
|
Summary
GET /organizations/{org}/integrations/{id}/repos/, triggered when a caller sendsper_pagewithoutsearchget_repositories_paginated()toBaseRepositoryIntegration(returnsNoneby default) with a GitHub override that slices the Django-cached repo listper_pagecontinue to receive the full repo list unchangedThis supports the SCM onboarding repo selector which needs fast page-at-a-time loading for orgs with many GitHub repos.
Stacks on #112327 (VDY-68 caching).
Test plan
OrganizationIntegrationReposTesttests pass (no behavior change for current consumers)OrganizationIntegrationReposPaginatedTesttests verify:installableOnlyfilter works with paginationper_pageuse the full list (existing behavior)search+per_pageuse the full list (search bypasses pagination)Refs VDY-46